Week 7.6 - Hands-On Activities & Assessment

What We'll Cover

This final session of Week 7 is entirely practical. You will work through three activities that build directly on everything we have covered this week about AI-assisted data analysis and visualisation. The activities move from the foundational skill of data cleaning, through the critical practice of code verification, to the higher-order challenge of evaluating AI-generated interpretations of data. Each activity is designed to develop a different dimension of the competence you need to use AI safely and effectively in your data work.

By the end of this session, you will have hands-on experience spotting and correcting errors in datasets, a tested workflow for verifying AI-generated analysis code before trusting its results, and sharpened instincts for distinguishing sound data interpretation from plausible-sounding nonsense. The weekly assessment then asks you to bring all of these skills together in a complete AI-assisted data analysis.

🧹 Activity 1: The Data Cleaning Challenge

Objective

Discover how AI handles messy, real-world data — and where it silently fails. Data cleaning is the unglamorous foundation of all analysis, yet it is where most errors are introduced and where AI tools are simultaneously most helpful and most dangerous. This exercise gives you a dataset with planted errors of exactly the kinds that appear in real research data, and asks you to use AI to find and fix them.

Setup

Obtain or create a messy dataset. Use a dataset from your own field if you have one, or create a small spreadsheet (50-100 rows, 6-8 columns) with deliberately planted errors. Your dataset should include a mix of the following problems:
- Missing values — some blank cells, some coded as "N/A", some as "-999", some as "NA"
- Inconsistent formatting — dates in mixed formats (01/03/2024, 3 Jan 2024, 2024-01-03), inconsistent capitalisation in categories
- Outliers — a few values that are clearly impossible (e.g., an age of 350, a temperature of 9999)
- Duplicate rows — two or three exact or near-exact duplicates
- Unit inconsistencies — some values in metres and others in centimetres in the same column, or some weights in kilograms and others in pounds
- Encoding errors — garbled characters in text fields, trailing whitespace in category labels
Record your planted errors. Before giving the dataset to AI, make a private list of every error you planted. This is your answer key — you will use it to evaluate how well the AI does.
Ask AI to clean the dataset. Upload or paste the data into your chosen AI tool and prompt it: "This dataset has data quality issues. Please identify all problems you can find, explain each one, and provide cleaned data." Do not hint at what kinds of errors exist.
Compare the AI's findings against your answer key. Work through the AI's response systematically. For each error you planted, check whether the AI found it, correctly diagnosed it, and proposed an appropriate fix.
Test the AI on ambiguous cases. Ask the AI about specific rows or values where the "correct" cleaning decision depends on domain knowledge. For example, if a value is unusual but not impossible in your field, does the AI flag it as an error or leave it alone? Does it make assumptions about your domain that are wrong?

What to Record

How many of your planted errors did the AI find? Which types of errors was it best at catching, and which did it miss?
Did the AI find any "errors" that were not actually errors — correct data that it mistakenly flagged as problematic?
For the errors it did catch, were its proposed fixes appropriate? Did it make domain-specific assumptions that were wrong?
What cleaning decisions require human judgement that AI cannot provide? Where is the boundary between automation and expertise?
If you used the AI's cleaned dataset without checking, what errors would have remained in your data — and what would those errors do to your analysis?

⚠️ The Silent Error Problem

The most dangerous AI cleaning errors are not the ones it misses — they are the ones it "fixes" incorrectly without telling you. An AI might silently convert all your dates to a single format but get the day/month order wrong for ambiguous dates like 03/04/2024. It might remove a value it considers an outlier that is actually a genuine and important measurement. Always verify the AI's cleaning decisions, not just its error detection.

💻 Activity 2: Code Generation and Verification

Objective

Practice the complete cycle of describing an analysis in natural language, receiving AI-generated code, and then rigorously verifying that the code does what you intended. This is the skill at the heart of AI-assisted data analysis: you must be able to bridge the gap between your research question and the code that answers it, even if you did not write that code yourself. The verification step is not optional — it is the difference between research and guessing.

Setup

Choose a dataset and research question. Use a dataset you are familiar with — ideally from your own research, or a well-known public dataset in your field. Formulate a specific, answerable research question that requires at least two analytical steps (e.g., data transformation plus statistical test, or grouping plus visualisation).
Describe your analysis to the AI in plain language. Write a clear natural-language description of what you want to do with the data. Do not use technical statistical terms unless they are essential — describe the analysis the way you would explain it to a colleague from a different field. For example: "I want to compare the average response times between the three treatment groups, check whether the differences are statistically meaningful, and create a chart showing the distribution of response times in each group."
Receive and read the code before running it. When the AI generates code, read through it line by line before executing anything. Even if you are not fluent in the programming language, try to follow the logic. Note anything you do not understand — these are the places where errors are most likely to hide.
- Ask the AI to add comments explaining each section if the code is not already well-commented
- If you cannot follow the logic at all, ask the AI to explain step by step what the code does
Verify the code using at least three of the following techniques:
- Manual spot-check: Pick 3-5 specific data points and trace the calculation by hand. Does the code produce the same result you get with a calculator or spreadsheet?
- Edge case testing: What happens with missing values, zeros, or extreme values? Does the code handle them correctly or crash?
- Sanity check on output: Do the results make sense given what you know about the data? If the mean is supposed to be around 50 and the code reports 5000, something is wrong.
- Alternative implementation: Ask a different AI tool (or the same tool in a new conversation) to write code for the same analysis. Compare the two implementations — do they produce identical results?
- Statistical assumption check: Does the code verify the assumptions of the statistical tests it uses? If it runs a t-test, does it check for normality and equal variances first?
- Ask AI to critique itself: Paste the code back to the AI and ask: "What could go wrong with this analysis? What assumptions does this code make that might not hold for my data?"
Document every discrepancy. If any verification technique reveals a problem — a wrong result, a missing assumption check, an edge case that breaks the code — document it precisely. Then ask the AI to fix the issue and verify the fix.

What to Record

How well did the AI translate your natural-language description into code? Did it capture your intent correctly, or did it make assumptions about what you wanted?
Which verification technique found the most issues? Which technique do you think is most important for your typical analyses?
Did the code make any statistical assumptions that were not appropriate for your data? Did it check those assumptions or just proceed?
If you had run the code without verification, what errors would have ended up in your results?
How confident are you in the final, verified code? What residual uncertainties remain?

💬 Discussion

Most students find that AI-generated code is syntactically correct and runs without errors — but that is not the same as being analytically correct. The most common problems are: choosing an inappropriate statistical test for the data structure, failing to check assumptions, handling missing data by silently dropping rows (changing your effective sample size), and producing visualisations that technically display the data but obscure the important patterns. The verification step catches these problems before they become errors in your research.

🔎 Activity 3: The Interpretation Challenge

Objective

Develop your ability to evaluate AI-generated interpretations of data analysis results. Generating code and producing numbers is only half of data analysis — the other half is interpreting what those numbers mean. AI tools are increasingly capable of producing narrative interpretations of statistical output, but these interpretations can range from perfectly accurate to dangerously misleading. This exercise trains you to tell the difference.

Setup

Run an analysis on a dataset you understand well. Use the code from Activity 2, or run a fresh analysis on data from your field. The analysis should produce clear statistical output — means, p-values, confidence intervals, effect sizes, or similar quantitative results.
Generate four AI interpretations. Ask the AI to interpret the results four separate times, using four different prompts. Start a fresh conversation for each interpretation so that earlier responses do not influence later ones:
- Interpretation A: "Here are my analysis results. What do they mean?" (minimal context)
- Interpretation B: "Here are my results. I expected to find [state your hypothesis]. Do the results support this?" (leading prompt)
- Interpretation C: "Here are my results. Please provide a critical interpretation, including what the results do NOT show and what alternative explanations exist." (critical prompt)
- Interpretation D: "Here are my results. Interpret them as if you were writing the discussion section of a peer-reviewed paper in [your field]." (academic framing)
Evaluate each interpretation. Read all four interpretations carefully and assess each one against the following criteria:
- Accuracy: Are all factual claims about the data correct? Does the interpretation correctly report what the numbers show?
- Overclaiming: Does the interpretation claim more than the data supports? Does it confuse correlation with causation, or treat a non-significant trend as meaningful?
- Completeness: Does it mention important limitations, alternative explanations, or caveats? Or does it present a single clean narrative?
- Bias: Does the interpretation change depending on how you framed the prompt? Did the leading prompt (B) produce a more confirmatory interpretation?
Write your own interpretation. After evaluating all four AI versions, write your own interpretation of the results in 150-200 words. Draw on the best elements of the AI interpretations, correct the errors, and add the domain expertise that only you can provide.

What to Record

Which of the four interpretations was most accurate? Which was most misleading? What made the difference?
Did the leading prompt (Interpretation B) produce confirmation bias in the AI's response? How did it differ from the others?
What did the critical prompt (Interpretation C) catch that the others missed? Was it genuinely critical, or did it just add generic caveats?
What domain-specific knowledge did you bring to your own interpretation that none of the AI versions included?
If a reader saw only one of the AI interpretations without the data, would they come away with a correct understanding of what the analysis showed?

⚠️ The Confirmation Bias Trap

This exercise usually demonstrates a subtle but important finding: AI interpretations are highly sensitive to framing. When you tell the AI what you expected to find, it tends to produce an interpretation that confirms your expectations — even when the data does not clearly support them. This is particularly dangerous because it feels like independent validation of your hypothesis when it is actually just the AI echoing your own priors back to you. In your real research, always generate interpretations using the critical prompt (C) first, before layering in your own expectations.

📝 Weekly Assessment

AI-Assisted Data Analysis (1000 words + code)

This week's assessment asks you to conduct a complete data analysis using AI tools, then document and critically reflect on the process. The assessment tests your ability to use AI as a genuine analytical partner — not just a code generator — while maintaining the rigour and critical judgement that separates research from automated number-crunching.

Requirements

Choose a dataset and formulate a research question. Use data from your own research, or select a publicly available dataset relevant to your field. The research question should require meaningful analysis — not just descriptive statistics. You need to do something with the data that involves at least one inferential step (comparison, relationship, prediction, classification, or similar).
- If using your own data, ensure you have appropriate ethical clearance for sharing it in a course context
- If using a public dataset, briefly explain why this dataset is appropriate for your question
Conduct the analysis using AI tools (1000 words + code). Write up the analysis as if it were part of a research paper: describe the data, explain your analytical approach, present the results (including at least one visualisation), and interpret the findings. Include the AI-generated code (annotated with comments) as an appendix. The 1000 words covers the written analysis only — code and figures are additional.
- You may use any AI tools: ChatGPT, Claude, Gemini, Copilot, or specialised data analysis tools
- The write-up should follow standard academic structure: introduction to the question, methods, results, discussion
Include a Verification Report (minimum 300 words, does not count toward 1000). Document how you verified the AI-generated code and results. You must apply at least three verification techniques from Activity 2. For each technique, describe what you checked, what you found, and what (if anything) you corrected.
- Be specific: "I spot-checked the mean of column X by calculating it in a spreadsheet and got 47.3, which matched the code output" is far better than "I verified the results"
- If you found errors during verification, describe the original error, how you caught it, and how you fixed it
Include a Critical Commentary (minimum 200 words, does not count toward 1000). Reflect on the limitations of your AI-assisted analysis. Address: What could the AI not do that you had to do yourself? Where did the AI's suggestions require domain knowledge to evaluate? What would you do differently next time? How confident are you in the results, and what residual uncertainties remain?
Include a Disclosure Statement describing the role of AI in your analysis — similar to those you practised in Week 6. Specify which tools you used, what they contributed (data cleaning, code generation, visualisation, interpretation), and what you contributed independently.

Assessment Criteria:

Analysis Quality (30%)

Appropriate research question, sound analytical approach, correct statistical methods, clear and informative visualisation, and accurate interpretation of results. We are looking for evidence that you understand the analysis, not just that you ran the code. Can you explain why this method is appropriate for this data? Do the conclusions follow from the evidence?

Verification (30%)

Thorough, specific, and honest documentation of how you verified the AI-generated code and results. At least three verification techniques applied with concrete evidence of what was checked and what was found. The best verification reports reveal a researcher who does not trust output until it has been independently confirmed — and who can explain exactly how they confirmed it.

Critical Commentary (25%)

Thoughtful reflection on the limitations of AI-assisted analysis, the role of domain expertise, and the boundary between what AI can and cannot do with data. We want to see genuine critical thinking — not generic disclaimers, but specific observations about where AI helped, where it fell short, and what you learned about using AI tools for data work in your field.

Documentation (15%)

Clear code comments, well-structured write-up, appropriate disclosure statement, and overall presentation quality. The code appendix should be readable and well-annotated. The disclosure statement should be specific enough that a reader could understand exactly what role AI played in the analysis.

📤 Submission

Upload to Amathuba by the deadline indicated on the activity. Include the verification report, critical commentary, and disclosure statement as clearly marked sections within your submission. The code appendix should be included as a separate section or attachment. The verification report, critical commentary, and disclosure statement do not count toward the 1000-word limit for the analysis itself.

Week 7 Summary & Key Takeaways

AI can accelerate every stage of data analysis — but acceleration without verification is just faster error production. The speed of AI-generated code and analysis is a genuine advantage, but only when paired with systematic checking. Unverified AI output is not research; it is automation with unknown error rates.
Data cleaning is where AI is simultaneously most helpful and most dangerous. AI tools excel at detecting formatting inconsistencies and common data quality issues, but they make silent assumptions about domain-specific decisions that can corrupt your data in ways that are difficult to detect downstream.
Always read AI-generated code before running it — and verify the output after running it. Syntactically correct code that executes without errors is not the same as analytically correct code that answers your research question. The most dangerous bugs produce plausible-looking but wrong results.
AI interpretations of data are highly sensitive to how you frame the prompt. Tell the AI what you expected to find, and it will tend to confirm your expectations. This is not independent validation — it is confirmation bias amplified by technology. Always start with a neutral or critical framing.
Statistical assumption checking is non-negotiable. AI tools frequently apply statistical tests without verifying whether the assumptions of those tests are met by your data. A t-test on non-normal data, a correlation on non-linear relationships, a regression with violated independence — these are not edge cases; they are the default failure mode of AI-generated analysis code.
Domain expertise is irreplaceable. AI can generate code and produce numbers, but it cannot tell you whether those numbers make sense in the context of your field. An outlier that AI removes might be your most important finding. A "significant" result might be meaningless in practical terms. Only you can make these judgements.
Natural language to code is powerful but lossy. Every translation from your research question to a prompt to AI-generated code involves information loss and potential misinterpretation. The clearer and more specific your description, the better the code — but some gap between intention and implementation always remains.
Reproducibility requires documentation. If you use AI to generate analysis code, documenting exactly what you asked for and what the AI produced is essential for reproducibility. Another researcher should be able to understand not just what you did, but why you made the choices you made and how you verified the results.

Looking ahead: Next week, we turn to Critical Evaluation and Limitations of AI-Generated Content. Having now used AI tools for literature review, writing, ideation, and data analysis, we will step back and develop systematic frameworks for evaluating AI output across all of these domains — understanding not just how to use AI, but how to judge when its contributions are trustworthy and when they are not.